Search CORE

19 research outputs found

Data Mining-based Fragmentation of XML Data Warehouses

Author: Darmont Jérôme
Mahboubi Hadj
Publication venue
Publication date: 01/01/2008
Field of study

With the multiplication of XML data sources, many XML data warehouse models have been proposed to handle data heterogeneity and complexity in a way relational data warehouses fail to achieve. However, XML-native database systems currently suffer from limited performances, both in terms of manageable data volume and response time. Fragmentation helps address both these issues. Derived horizontal fragmentation is typically used in relational data warehouses and can definitely be adapted to the XML context. However, the number of fragments produced by classical algorithms is difficult to control. In this paper, we propose the use of a k-means-based fragmentation approach that allows to master the number of fragments through its

k

parameter. We experimentally compare its efficiency to classical derived horizontal fragmentation algorithms adapted to XML data warehouses and show its superiority

arXiv.org e-Print Archive

CiteSeerX

A Join Index for XML Data Warehouses

Author: Aouiche Kamel
Darmont Jérôme
Mahboubi Hadj
Publication venue
Publication date: 01/01/2008
Field of study

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.Comment: 2008 International Conference on Information Resources Management (Conf-IRM 08), Niagra Falls : Canada (2008

arXiv.org e-Print Archive

AIS Electronic Library (AISeL)

Optimisation de la performance des entrepôts de données XML par fragmentation et répartition

Author: Mahboubi Hadj
Publication venue: HAL CCSD
Publication date: 08/12/2008
Field of study

XML data warehouses form an interesting basis for decision-support applications that exploit heterogeneous data from multiple sources. However, XML-native database systems currently suffer from limited performances, both in terms of manageable data volume and response time for complex analytical queries. It is therefore necessary to design methods to optimize performances.In this thesis, we propose to address both these issues by fragmenting and distributing XML data warehouses on grids. To the best of our knowledge, we propose the first fragmentation methods for XML data warehouses. These methods exploit an XQuery workload and output a derived horizontal fragmentation schema.We first adapted the most efficient fragmentation methods from the relational context to XML, and then proposed an original k-means-based fragmentation method that allows mastering the number of fragments. We finally propose an approach aimed at distributing XML data warehouses on grid architectures.Our proposals exploit a unified XML warehouse reference model that we propose to synthesize and enhance related work from the literature.Finally, we experimentally validate our proposal and compare our fragmentation and distribution methods. For this purpose, we designed and developed an XML data warehouse benchmark: XWeB. Our results show that our methods help overcome the data volume andquery execution time limitations. They also show that our k-means-based fragmentation method outperforms classical derived horizontal fragmentation methods, both in terms of performance gain and overhead.Les entrepôts de données XML forment une base intéressante pour les applications décisionnelles qui exploitent des données hétérogènes et provenant de sources multiples. Cependant, les Systèmes de Gestion de Bases de Données (SGBD) natifs XML actuels présentent des limites en termes de volume de données gérable, d'une part, et de performance des requêtes d'interrogation complexes, d'autre part. Il apparaît donc nécessaire de concevoir des méthodes pour optimiser ces performances.Pour atteindre cet objectif, nous proposons dans ce mémoire de pallier conjointement ces limitations par fragmentation puis par répartition sur une grille de données. Pour cela, nous nous sommes intéressés dans un premier temps à la fragmentation des entrepôts des données XML et nous avons proposé des méthodes qui sont à notre connaissance les premières contributions dans ce domaine. Ces méthodes exploitent une charge de requêtes XQuery pour déduire un schéma de fragmentation horizontale dérivée.Nous avons tout d'abord proposé l'adaptation des techniques les plus efficaces du domaine relationnel aux entrepôts de données XML, puis une méthode de fragmentation originale basée sur la technique de classification k-means. Cette dernière nous a permis de contrôler le nombre de fragments. Nous avons finalement proposé une approche de répartition d'un entrepôt de données XML sur une grille. Ces propositions nous ont amené à proposer un modèle de référence pour les entrepôts de données XML qui unifie et étend les modèles existants dans la littérature.Nous avons finalement choisi de valider nos méthodes de manière expérimentale. Pour cela, nous avons conçu et développé un banc d'essais pour les entrepôts de données XML : XWeB. Les résultats expérimentaux que nous avons obtenus montrent que nous avons atteint notre objectif de maîtriser le volume de données XML et le temps de traitement de requêtes décisionnelles complexes. Ils montrent également que notre méthode de fragmentation basée sur les k-means fournit un gain de performance plus élevé que celui obtenu par les méthodes de fragmentation horizontale dérivée classiques, à la fois en terme de gain de performance et de surcharge des algorithmes

Thèses en Ligne

HAL

Hal-Diderot

Indices in XML Databases

Author: Darmont Jérôme
Mahboubi Hadj
Publication venue: IGI Global
Publication date: 01/02/2009
Field of study

International audienceWith XML becoming a standard for business information representation and exchange, stor-ing, indexing, and querying XML documents have rapidly become major issues in database research. In this context, query processing and optimization are primordial, native-XML data-bases not being mature yet. Data structures such as indices, which help enhance performances substantially, are extensively researched, especially since XML data bear numerous specifici-ties with respect to relational data. In this paper, we survey state-of-the-art XML indices and discuss the main issues, tradeoffs and future trends in XML indexing. We also present an in-dex that we specifically designed for the particular architecture of XML data warehouses

HAL

Query Performance Optimization in XML Data Warehouses

Author: Darmont Jérôme
Mahboubi Hadj
Publication venue: IGI Global
Publication date: 01/01/2010
Field of study

International audienceXML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this chapter, we present two such techniques. First, we propose a join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. Second, we present a strategy for selecting XML materialized views by clustering the query workload. To validate these proposals, we measure the response time of a set of decision-support XQueries over an XML data warehouse, with and without using our optimization techniques. Our experimental results demonstrate their efficiency, even when queries are complex and data are voluminous

HAL